Proceedings ofThe National Conference on Man Machine Interaction 2014 [NCMMI 201+] 



Secure Data Management And Transaction of 
Heterogeneous Large Data Sets In Grid 
Computing Environment 



K. Ashokkumar M.E, Dr. C. Chandra sekar Ph.D 

Computer Science & Engineering Department, Sathyabama University. 
Professor, Dept of Computer Science, Periyar University 



A 



Abstract — Data management is one in every of the difficult problems perpetually in grii 
environments. As a result of grid computing systems and its applications dt 
information set, attributable to heterogeneous grid resources that happinesi 
organizations and locations with different access policies. Providing security for g: 
a tough task. With this abundant interest, security becomes necessary 
authorization, resource protection, secure communication, information 



p^rd^ig ant 
ith^rerribly giant 
©tally different 
ther a simple nor 
'duce authentication, 
'dentiality, information 



integrity, trust policies management, user key and document manageiT^eW^fc-^ice protection, and network 
security. During this proposal we have a tendency to square meastttiOl^nning to give security for the 
resources mistreatment thereforeme reliable security mechanisn^^Om preserve the heterogeneous 
information gift within the distributed systems. And conjointly fotflhipg, however with efficiency integrate 
and method the info set from the heterogeneous atmosphere an^m^nformation integration strategies, that 
determine records belongs to a similar cluster of person wJaArewde in multiple locations, square measure 
essential to those efforts. 
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large data set, heterogeneous resources. 



jrid Computing 

c<V" 

There has been a surge of interest w^g^ computing, a way to enlist large numbers of machines to work on 
multipart computational probfflms shell as circuit analysis or mechanical design. There are excellent 
reasons for this attention ampjjL yentists, engineers, and business executives. Grid computing enables the 
use and pooling of compittefcmdaata resources to solve complex mathematical problems. The technique is 
the latest development jffai^volution that earlier brought forth such advances as distributed computing, 
the worldwide web, arftSAtfaborative computing. 

Herer We need\an^|ipervised system with enough memory and well computation power, but in this world 
no single sys^rSjrhs been such kind of things as like our expectation. So we need to combine together few 
systems ta^s^ one cluster system. This cluster system can run the application with more size. But our 
problenJ*™faving all data set in different place. Cluster formation can happen with few systems only. And 
algo wejeed more processing power and computation power. And can't expect such things from cluster 

To avoid this problem, can go for grid system. We may call this as group of cluster can form a grid system. 
In this grid system ,have multiple clusters in each and every data set. So, execution in the form of searching, 
matching everything is possible in data set. Because, each dataset has their own cluster system, so execution 
is easy. Finally, the interdomain security solutions used for grids should be able to intemperate with, rather 
than replace, the varied intradomain access management technologies inevitably encountered in individual 
domains.. [1] [2] [9]. 
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Fig.i Grid contaAing [9] 
II. Existing Information Retrieval^qNiods and Security in Grid Computing 
Information^^Keval & Search Algorithms: 

This provides the following characteristtsjj 

1) Accuracy: The sheer vMumeVof information now stored in electronic media magnify deficiencies 
in recall (the percenftg»kof relevant information retrieved. Giving the user reasonable-sized 
response with hWh ^KiSSon can mean missing hundreds of relevant texts. 

2) Speed: As the turaSfy of text that must be searched increases, the speed of searching can become 
a reserve boui*™e£ In practical terms, the need for fast search means that more computational- 
intensive mQ&ptmg such as NLP techniques must either apply very selectively or run as "batch" 
indexiflc foolprior to retrieval. 

3) ConaRratcy: Many information retrieval environments require indexing of the text by the groups 
of ^™exers or by the authors. This leads to a decrease in accuracy from the inevitable 
«ijw)»sistencies, which automatic processing could help to avoid. 

4/^iase of use: The growth of personal computer has made obsolete the traditional model of 
S^^nformation retrieval with a trained human intermediary, giving systems responsiveness a high 
priority, [g] 

A. Existing Method 

There are several types of identifiers that can be used to link up with the upcoming enormous identification 
system known as India's Large dataset. For example - they can be as follows - driver license, ration card, 
election photo identity, PAN card, passport, health insurance card, bank account number, post office 
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account number, mobile phone number, landline phone number, email addresses. These shall be the 
criteria which the resident of the card may detail the authority with. [3] [13J 

The working style of India's's large dataset card is based on 16 digit number. In which 12 digits only shown 
to public, and remaining 4 digits use for official purpose. 

Upon the loss of the U1D number, the card holder will have to go through a series of processes for the 
Identity Check and perhaps, since this process is a bit expensive, they will have to pay a small sum of money 
to the authority. / ai ai 03 134 05 a6 ay a8 ag 10 an au,j 

India's's Large dataset number schema will actually have not even 12 digits but only u digits. The ^Mufnber 
will be the Implicit Version Number while the second, be the Check digit. So, that means tf&L'me large 
dataset number will only have 11 digits which really matters. ^ 



The numbers in UID will be non-repeating and non traceable or predictable and wilW^ehnerated through 
computer algorithms. [13] 



B. Access Method 

♦V\ » 

The amount of accessing methods of those data sets is heavy, becau^^^iave to refer all the data set 
concurrently in order to get their information. Each and every dau^^Sre located in different location. 
Accessing of those sets is not quite easy, and also can't expect unimfcn«s over there because each data sets 
have their own behavior. 

Here we have to concentrate one thing primarily. Ti^^fe data accuracy because user are getting 
information from different data set which are very hug*A sMe, and also they have huge amount of entries. 
So when retrieve information from those data set^h^y have chance to get duplicate information like 
redundant data and modified data. 

Second thing is accessing speed. We alreathHtioV each data set is located in different location. Accessing of 
entire data set from single place is not ca^^Khcy need huge algorithms to implement this method. 

C. Grid Security 

We introduce the grid securitjTdjwnside with an example illustrated in Figure 1.1 . this instance, though 
somewhat contrived, capiui^^jecessary components of real applications. 




III. Security Needs 

Grid systems ap plications could need any or all of the normal security functions, together with 
authenticatiojK^^ljess control, integrity, privacy, and nonrepudiation. during this paper, we tend to focus 
totally on ^^W^lems with authentication and access control. Speci cally, we tend to get to {1) give 
authenti^^*! solutions that enable a user, the processes that comprise a user\'s computation, and also the 
resoq^ctlk utilized by those processes, to verify every other\'s identity; and (2) enable native access 
^arU^Mnent mechanisms to be applied while not modification, whenever doable. As are mentioned in 
]wtj|Bn four, authentication forms the inspiration of a security policy that permits numerous native security 
policies to be integrated into a global framework. 

In developing a security design that meets these needs, we tend to additionally opt to satisfy the subsequent 
constraints derived from the characteristics of the grid surroundings and grid applications: 
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Single sign-on: A user ought to be ready to evidence once {e.g., once beginning a computation) and initiate 
computations that acquire resources, use resources, unharness resources, and communicate internally, 
while not additional authentication of the user. 



Protection of credentials: User credentials (passwords, personal keys, etc.) should be protected. 

Interoperability with native security solutions: whereas our security solutions could give interdomain access 
mechanisms, access to native resources can generally be determined by a neighborhood security policy 
that's enforced by a neighborhood security mechanism, it's impractical to change each native resoiir^^ 
accommodate interdomain access; instead, one or a lot of entities in an exceedingly domair 
interdomain security servers) should act as agents of remote clients/users for native resources. * 

Exportability: we tend to need that the code be (a) marketable and (b) feasible in internati tbeds. In 

short, theexportability problems mean that our security policy cannot directly or CadireStly need the 
employment of bulk secret writing. 



y poncy 



Uniform credentialscertication infrastructure: Interdomain access needs, at a rniam^^i, a standard method 
of expressing the identity of a security principal like Associate in Nursing actufdnker or a resource. Hence, 
it's imperative to use a standard (such as X.509V3) for encryption credentiaJi^foi^security principals. Support 
for secure cluster communication. A computation will comprise vari*|A^S^Jjcesses which will have to be 
compelled to coordinate their activities as a bunch. The composifl™3S a method cluster will and can 



modification throughout the lifespan of a computation. 



Hence, support is required for secure (in this context, authd 
No current security resolution supports this feature; 
contexts. 



ltic^aTed) communication for dynamic teams. 
"-API has no provisions for cluster security 



Support for multiple implementations: the protecm^ policy shouldn\'t dictate a speci c implementation 
technology. Rather, it ought to be doable to^mprwnent the protection policy with a spread of security 
technologies, supported each public and sha«d\ey cryptography. 




Fig.2 Grid Security 
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IV. Proposed Method 

The amount of accessing methods of those data sets is heavy, because we have to refer all the data set 
concurrently in order to get their information. Each and every data set are located in different location. 
Accessing of those sets is not quite easy, and also we can't expect uniqueness over there because each data 
set have their own behavior. If we need all the data sets to be response quickly and perfectly means we have 
to change or modify those data set into similar style, like create or modify existing all data sets with more or 
less similar conditions, their working style, the way of accessing those sets, way of response of those dataset 
should be same manner. 

Here we create 13 digits id as large dataset card number. 

[ai a2 a3 a4 a5 a6 ay a8 ag aio an ai2 arj] 

• ai - state a2 - district a3 — taluk a4 -village 

a5 - street a6 — door number ay — name of holder 
a8 — date of birth of holder ag —gender aio —license 
number an — ration card number an — pan card 
number ai3~passport number 

In this method passport dataset contains mobile number, landline 



Our proposed work is, we have to assign id for each state, eac 
street. Once we assigned id to those we can retrieve easily fri 
data sets in the form of [ai ai a3 a.4 a5 a6 ay a8 ag aio 

First nine fields should be same for all data sets. Ne. 
have to make entry of data in data set in above fo: 
aio indicates license number. How can we achieve 




their data set. 



, each taluk, each village, and each 
. By doing like this we have to design 



1 change depends on their pattern. So we 
i example for license database. Here 



For that first when we enter the licens_e 
manner like state, district, taluk, 
license card number. If we have 1: 
their license card data set very 
and license card number. 



Similarly in ration card 
ration card data set 1, 
dataset card and 
modify depi 
informati 
passport da' 
data se 
Wh< 




information in data set, that should follow the above 
door no, holder's name, date of birth, and gender at last 
mber like this, we can retrieve license number of holder from 
latching first nine digit number of both large dataset card number 



t an indicates ration card number. If we store ration card information in 
can get information by comparing first nine digits number of both large 
ra,fifl£<ftrd. Because first nine digits is common for all data set, and next field only will get 
oCtleir information. If its ration card data set then ration card number of particular holder 
— If its pan card data set then ai2th field consists of pan card number. Similarly in 
13"" field indicates passport number. We can do one thing here, When we make passport 
include holder's optional information like landline number, mobile number, E-mail address. 



e^^hTretrieve passport information of particular holder it also retrieves mobile number, landline 
hra^E-mail address. 



Atlast our work is, We have to make all the information in data set should be in the form of digits, and 
have to assign number for each state, district, taluk, and village, street and also these are all act as primary 
key. What do you mean by primary key, it's a key attribute which is a combination of both unique value and 
not null value? So we assign state, district, taluk, village, street number as a primary key. Once we assigned 
we can retrieve those information by using query statement. By writing query for that we can retrieve 
information from those data set and will display easily. 
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"our method is just one time effort...once we did it we can use our system life long." 
lifficient method: To find / locate the database 

• Set id length of 13 bit id or more for database 

• Set every two digits for locations and to identify the data base 

• Use unique method (solution of problem identified) for find the required data base. 

• When searching , use specific searching algorithm to make quick and search more easier 

• Here apply the parallel computation algorithm for process faster. 

• This can be run parallel in the entire existing database to retrieve the massive information^ 

A. Advantage * 



Less accessing time: 

All the data set are created in same manner and also those details are presented jrf£jjBt^orm of id. We can 
retrieve that information by id easily and also quickly. Wherever data set preaA£< »u£ have used our id for 
refer that. ^/ 

"our system will retrieve information parallel from data sets and displ&ft^RUlt quickly" 

B. Overview Of Proposed D 





1) Key -Matching algorithm 

We have unique id which is being call as a key. Our key rfl*jBrcvith key which is presented in each data set. 
Once match found corresponding data will be displa^hAelse search until match found, for that we use 
searching algorithm. 

2) Searching Algorithm 

Here we use hierarchical model for searclijAk^ys. All data set coming under one node, once we follow that 
node we can find our data where tat p«enf can find easily as soon as possible. 

3) Parallel Processing Algorithm 

Once we entered our id ji s|9t2Rearch all data set parallel and show result in main page. It is very hard to 
implement. To solve thismOTffim we need high computation power, from our point of view grid computing 



his 

parallel and give resm(C^main system. 



C. Efficient Process / Work 



1) Key -^tN*hi n" Algorithm 



unique id is called as a key. That is matches with key, which is presented in each data set. 
Id-ai ai 113 04 05 a6 ay a8 ag aio 



Once match found, the corresponding data will be display, else search until match found, so we use 
searching algorithm for that. 

Int Key id; 

If (key id— key id in db) then Match found; 
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Else Search goes until match found; 
End; 

When we enter this id in our application it should match with corresponding ID which is present in 
particular data set, because ID is present in data set and also in the same form. So searching is easy when 
we use binary search. Because, in data set all the data present in hierarchical form, once the top element is 
match then move into hierarchical order. 



_ _lhj*^iddr< 

further thing will be easier only. 



Here first match ai with ai for state, once it match move into their state's district only, not other' 
district. Similarly once district match move into corresponding taluk only and so on. 

Once our ID till ag is match it seems we found the address of particular holder. Once we fin^^ 1 



2 ) Searching Algorithm 

Here we use binary and parallel search algorithm for searching keys. All data cam^Jinder one node, once 
we follow that node we can find our data easily as soon as possible. 



ld=ai a2 a3 04 05 a6 ay a8 ag a 



We can search data by using keys, because we have formed data seTy^h keys only. So searching is easy as 
much as possible. 



If (ai— ai) Go to specific state; 
If (a2— a2) Go to specific district 



If (a6==a6) Go to specific door 
End; 



an 

ft 6 



Once we identify the address of the holdei*fcy%ey matching algorithm, ie. We have completed first 9 digits 
process and we have remaining onejdi^i^^y; it may vary depends on data set. 



If it's driving license its value aio^io, an- ration card number, ai2 - pancard number, ar$ - passport 
number. 



Last digit is optional; it raawfcff we need anyone of above we have to go for their data set individually. And 
in passport number rnMatyJnth our id then corresponding holder's phone number, mobile number, mail-id 
also can retrieve. '/f 

3) Parallel Pnadvang Algorithm 




Once we feared our id it should search all data set parallel and show result in main page. It is very hard to 
irnpIefflwWSo solve this problem we need high computation power, from my point of view cluster system 
rproblem, because it has well computation power, so it can search this all data set parallel and give 
Itsullto main system. [2], [12], [6] 
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Int key id; 

If (aio==aio in license dataset) Retrieve specific no 
If (an— an in ration dataset) Retrieve specific no 
If (ai2— ai2 in pan card dataset) Retrieve specific 
If (arj— ar} in passport dataset) Retrieve specific 

^^11 

;Jicen 

O 0 



Display all values in system; 

When we enter our id, first 9 digi older's information includes name, gender, address and so on. 

Further digits indicate driving JicenSeVation card number, pan card number, passport number.lf holder 
does not have passport then ^^bould display the result. Similarly if user does not have pan card then 



should display that also. O: 
result. This we need for 
finally show the result 



4) Security nr 



enter our id it should match with all data set parallel and produce the 
y, Our key should search the keys in all data set and match those keys and 



User authen<^a%3h in grid computing resources is one in all the elemental procedures to confirm secure 
communi«flfluns And share system resources over an insecure public network channel. Especially, the aim 
of the-ijh^timc password is to create it tougher to achieve unauthorized access to restricted resources, 
raifagytyn using the password file as conventional authentication systems, various one-time password 
^hemies using smart cards, time-synchronized token or short message service in order to cut back the risk 
OT^Hange of state and maintenance price. However, QR code or digital schemes are impractical because of 
the far from ubiquitous hardware devices or the infrastructure requirements. To remedy these weaknesses, 
the attraction of the QR code technique is introduced into our one-time watchword authentication 
protocol. This methodology can defend the resource and knowledge sets from the user and third party 
users, when and each level it\'ll verify the user key and validate for more security of systems / grid 
computing resources. [16], [17] 
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V. Working Model and Simulation 
Authority Card 



ED: 



j lg76M3ZU2*4 j 



mm 



DRIVING I1CEN3E 



PAN CARD NO 



RATION CARD NO 



E 



5S>' 



Fig. 4 Working mudol application 

Once we enter our user ID we should click submfc^itton. Once we submit it, the back end system will 
search all corresponding information at all data ser^pwrallel and produce the result. 




Fig.5 Data serach from multiple data sets (parallel) 
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Experimental results on simulated data sets (From the real data, 
vs. 2,000, and 3,000) 



: randomly picked 1,000 vs. 1,0000 



Algorithm 


coinplet 

eness 


accura 
cv 


Time(ms) 


Existing solution- 
compared 


Com. 


Acc. 


Key match 


98.40% 


96.10% 


14593 


97.70% 


95.40% . 


search 


98.40% 


96.40% 


13515 


97.70% 


95.40% 1 


Parallel process 


98.40% 


96.90% 


11422 


97.70% 


95.4QJCO ' 



4> 




Fi&8^fcf!ency ( in terms of accuracy tested with sample data) 

A. 



we randomly picka^^o" vs. 1,0000 vs. 

Accuracy and c\ri%jl3eness are defined stringently to show the performance of 




picloid^^ 
cWfclaei 



000, and 3,000 vs. 3,000 data sets as three groups of linkage tests. 

methods. 



VII. Conclusion and Future Work 



■oposed method we have executed the large data set from multiple resources simultaniously. 
eedure produced result with in estimated time and less. Here integrating multiple data sets of 
se are efficient and that multiple data sets are compute by the grid infrastructures environment 



We plan to design and develop the framework of this method for grid computing resources as a tool with 
security features(QR code). This framework will embedded with grid computing infrastructure as software 
API to increase heterogeneous data manipulations of grid data bases and it will do fast and parallel 
computation efficiently in all grid resources 
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